Compaction Techniques for Nextword Indexes

نویسندگان

  • Dirk Bahle
  • Hugh E. Williams
  • Justin Zobel
چکیده

Most queries to text search engines are ranked or Boolean. Phrase querying is a powerful technique for refining searches, but is expensive to implement on conventional indexes. In other work, a nextword index has been proposed as a structure specifically designed for phrase queries. Nextword indexes are, however, relatively large. In this paper we introduce new compaction techniques for nextword indexes. In contrast to most index compression schemes, these techniques are lossy, yet as we show allow full resolution of phrase queries without false match checking. We show experimentally that our novel techniques lead to significant savings in index size.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Phrase Querying with an Auxiliary Index

Search engines need to evaluate queries extremely fast, a challenging task given the vast quantities of data being indexed. A significant proportion of the queries posed to search engines involve phrases. In this paper we consider how phrase queries can be efficiently supported with low disk overheads. Previous research has shown that phrase queries can be rapidly evaluated using nextword index...

متن کامل

Efficient Phrase Querying with an Auxiliary Index

Search engines need to evaluate queries extremely fast, a challenging task given the vast quantities of data being indexed. A significant proportion of the queries posed to search engines involve phrases. In this paper we consider how phrase queries can be efficiently supported with low disk overheads. Previous research has shown that phrase queries can be rapidly evaluated using nextword index...

متن کامل

What's Next? Index Structures for Efficient Phrase Querying

Text retrieval systems are used to fetch documents from large text collections, using queries consisting of words and word sequences. A shortcoming of current systems is that word-sequence queries, also known as phrase queries, can be expensive to evaluate, particularly if they include common words. Another limitation is that some forms of querying are not supported; an example is phrase comple...

متن کامل

Optimised Phrase Querying and Browsing of Large Text Databases

Most search systems for querying large document collections—for example, web search engines—are based on well-understood information retrieval principles. These systems are both efficient and effective in finding answers to many user information needs, expressed through informal ranked or structured Boolean queries. Phrase querying and browsing are additional techniques that can augment or repl...

متن کامل

Compaction of Coarse-Textured Soils: Balance Models across Mineral and Organic Compositions

Soil bulk density (BD), degree of compactness (DC), maximum bulk density (MBD), and critical water content (CWC) at which MBD is reached are commonly used to characterize soil compaction, and can be predicted from soil texture and organic matter content, omitting other components such as sand sub-classes and soil cementing agents and potential biases such as data redundancy and sub-compositiona...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001